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Abstract 

Given a language L and a nondeterministic finite automaton M, we consider whether 
we can determine efficiently (in the size of M) if M accepts at least one word in L, or 
infinitely many words. Given that M accepts at least one word in L, we consider how 
long a shortest word can be. The languages L that we examine include the palindromes, 
the non-palindromes, the /c-powers, the non-A:-powers, the powers, the non-powers (also 
called primitive words), the words matching a general pattern, the bordered words, and 
the unbordered words. 
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1 Introduction 



Let L C E* be a fixed language, and let M be a deterministic finite automaton (DFA) 
or nondeterministic finite automaton (NFA) with input alphabet E. In this paper we are 
interested in three questions: 

1. Can we efficiently decide (in terms of the size of M) if L(M) contains at least one 
element of L, that is, if L(M) n L ^ 0? 

2. Can we efficiently decide if L(M) contains infinitely many elements of L, that is, if 
L(M) n L is infinite? 

3. Given that L(M) contains at least one element of L, what is a good upper bound on 
a shortest element of L(M) H LI 

We can also ask the same questions about L, the complement of L. 

As an example, consider the case where E = {a}, L is the set of primes written in unary, 
that is, {a* : i is prime }, and M is a NFA with n states. 

To answer questions (1) and (2), we first rewrite M in Chrobak normal form [5]. Chrobak 
normal form consists of an NFA M' with a "tail" of 0(n 2 ) states, followed by a single 
nondeterministic choice to a set of disjoint cycles containing at most n states. Computing 
this normal form can be achieved in 0(n 5 ) steps by a result of Martinez [23] . 

Now we examine each of the cycles produced by this transformation. Each cycle accepts 
a finite union of sets of the form (a*)*a c , where t is the size of the cycle and c < n 2 + n; both 
t and c are given explicitly from M' . Now, by Dirichlet's theorem on primes in arithmetic 
progressions, gcd(t, c) = 1 for at least one pair (t, c) induced by M' if and only if M accepts 
infinitely many elements of L. This can be checked in 0(n 2 ) steps, and so we get a solution 
to question (2) in polynomial time. 

Question (1) requires a little more work. From our answer to question (2), we may assume 
that gcd(t, c) > 1 for all pairs (t, c), for otherwise M accepts infinitely many elements of L 
and hence at least one element. Each element in such a set is of length kt + c for some k > 0. 
Let d = gcd(t, c) > 2. Then kt + c — (kt/d + c/ d)d. If k > 1, this quantity is at least 2d and 
hence composite. Thus it suffices to check the primality of c and t + c, both of which are at 
most n 2 + 2n. We can precompute the primes < n 2 + 2n in 0(n 2 ) time using a modification 
of the sieve of Eratosthenes [26], and check if any of them are accepted. This gives a solution 
to question (1) in polynomial time. 

On the other hand, answering question (3) essentially amounts to estimating the size of 
the least prime in an arithmetic progression, an extremely difficult question that is still not 
fully resolved [13] , although it is known that there is a polynomial upper bound. 

Even the case where L is regular can be difficult. Suppose L is represented as the 
complement of a language accepted by an NFA M' with n states. Then if L(M) = E*, 
question (1) amounts to asking if L(M') ^ E*, which is PSPACE-complete [21 Section 10.6]. 
Question (2) amounts to asking if L(M') is infinite, which is also PSPACE-complete [18J. 
Question (3) amounts to asking for good bounds on the smallest string not accepted by an 
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NFA. There is an evident upper bound of 2 n , and there are examples known that achieve 
2 cn for some constant c > 0, but more detailed analysis is still lacking [9]. 

Thus we see that asking these questions, even for relatively simple languages L, can 
quickly take us to the limits of what is known in formal language theory and number theory. 

In this paper we examine questions (l)-(3) in the case where M is an NFA and L is either 
the set of palindromes, the set of A;-powers, the set of powers, the set of words matching a 
general pattern, the set of bordered words, or their complements. 

In some of these cases, there is previous work. For example, Ito et al. [T7] studied several 
circumstances in which primitive words (non-powers) may appear in regular languages. As 
a typical result in [17] , we mention: "A DFA over an alphabet of 2 or more letters accepts a 
primitive word iff it accepts one of length < 3n — 3, where n is the number of states of the 
DFA" . Horvath, Karhumaki and Kleijn [15] addressed the decidability problem of whether 
a language accepted by an NFA is palindromic (i.e., every element is a palindrome). They 
showed that the language accepted by an NFA with n states is palindromic if and only if all 
its words of length shorter than 3n are palindromes. 

Here is a summary of the rest of the paper. In section [21 we define the objects of study 
and our notation. 

In section [21 we begin our study of palindromes. We give efficient algorithms to test if 
an NFA accepts at least one palindrome, or infinitely many. We also show that a shortest 
palindrome accepted is of length at most quadratic, and further, that quadratic examples 
exist. In section [H we give efficient algorithms to test if an NFA accepts at least one non- 
palindrome, or infinitely many. Further, we give a tight bound on the length of a shortest 
non-palindrome accepted. 

In sectional we begin our study of patterns. We show that it is PSPACE-complete to test 
if a given NFA accepts a word matching a given pattern. As a special case of this problem 
we consider testing if an NFA accepts a fc-power. We give a algorithm to test if a /c-power 
is accepted that is polynomial in k. If k is not fixed, the problem is PSPACE-complete. We 
also study the problem of accepting a power of exponent > k, and of accepting infinitely 
many fc-powers. 

In section [61 we give a polynomial-time algorithm to decide if a non-/c-power is accepted. 
We also give upper and lower bounds on the length of a shortest /c-power accepted. In 
section [7J we give an efficient algorithm for determining if an NFA accepts at least one 
non-power. In section [HI we bound the length of the smallest power. Section gives some 
additional results on powers. 

In section [101 we show how to test if an NFA accepts a bordered word, or infinitely many, 
and show that a shortest bordered word accepted can be of quadratic length. In section [11] 
we give an algorithm to test if an NFA accepts an unbordered word, or infinitely many, and 
we establish a linear upper bound on the length of a shortest unbordered word. 
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2 Notions and notation 



Let £ be an alphabet, i.e., a nonempty, finite set of symbols (letters). By £* we denote 
the set of all finite words (strings of symbols) over E, and by e, the empty word (the word 
having zero symbols). The operation of concatenation (juxtaposition) of two words u and v 
is denoted by u • v, or simply uv. If w G £* is written in the form w = xy for some x, y G E*, 
then the word yx is said to be a conjugate of w. 

For u> G E*, we denote by w R the word obtained by reversing the order of symbols in w. 
A palindrome is a word w such that w = w R . If L is a language over £, i.e., L C E*, we say 
that L is palindromic if every word w 6 I is a palindrome. 

Let > 2 be an integer. A word y is a k-power if y can be written as y = x k for some 
non-empty word x. If y cannot be so written for any k > 2, then y is primitive. A 2-power 
is typically referred to as a square, and a 3-power as a cu&e. 

Patterns are a generalization of powers. A pattern is a non-empty word p over a pattern 
alphabet A. The letters of A are called variables. A pattern p matches a word w G £* if 
there exists a non-erasing morphism h : A* — > £* such that = u>. Thus, a word w is a 
/c-power if it matches the pattern a k . 

Bordered words are generalizations of powers. We say a word x is bordered if there exist 
words u G £ + , iu6S* such that x = uwu. In this case, the word u is said to be a border for 
x. Otherwise, x is unbordered. 

A nondeterministic finite automaton (NFA) over £ is a 5-tuple M = (Q, E, 5, go> -^) where 
Q is a finite set of states, 5 : Q x £ — > 2^ is a next-state function, g is an initial state and 
F C Q is a set of final states. We sometimes view 5 as a transition table, i.e., as a set 
consisting of tuples (p, a, q) with p,q e Q and a G E. The machine M is deterministic 
(DFA) if 5 is a function mapping Q x E — > Q. We consider only complete DFAs, that is, 
those whose transition function is a total function. Sometimes we use NFA-e, which are 
NFAs that also allow transitions on the empty word. 

The size of M is the total number iV of its states and transitions. When we want 
to emphasize the components of M, we say M has n states and t transitions, and define 
N :— n + t. The language of M, denoted by L(M), belongs to the family of regular languages 
and consists of those words accepted by M in the usual sense. A successful path, or successful 
computation of M is any computation starting in the initial state and ending in a final state. 
The label of a computation is the input word that triggered it; thus, the language of M is 
the set of labels of all successful computations of M. 

A state of M is accessible if there exists a path in the associated transition graph, starting 
from go and ending in that state. By convention, there exists a path from each state to itself 
labeled with e. A state q is coaccessible if there exists a path from q to some final state. A 
state which is both accessible and coaccessible is called useful, and if it is not coaccessible it 
is called dead. 

We note that if M is an NFA or NFA-e, we can remove all states that are not useful in 
linear time (in the number of states and transitions) using depth-first search. We observe 
that L(M) ^ if and only if any states remain after this process, which can be tested in 
linear time. Similarly, if M is a NFA, then L(M) is infinite if and only if the corresponding 
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digraph has a directed cycle. This can also be tested in linear time. 

If M is an NFA-e, then to check if L(M) is infinite we need to know not only that the 
corresponding digraph has a cycle, but that it has a cycle labeled by a non-empty word. This 
can also be checked in linear time as follows. Let us suppose that all non-useful states of M 
have been removed. We wish to test whether there is some edge of the digraph of M that is 
part of some cycle and is not labeled by the empty word. We now observe that an edge of 
a digraph belongs to a directed cycle if and only if both of its endpoints lie within the same 
strongly connected component. It is well known that the strongly connected components of 
a graph can be computed in linear time (see P Section 22.5]). Once the strongly connected 
components of the NFA-e are known, we simply check the edges not labeled by e to determine 
if there is such an edge with both endpoints in the same strongly connected component. Thus 
we can determine if L(M) is infinite in linear time. 

Although the results of this paper are generally stated as applying to NFA's, by virtue 
of the preceding algorithm, one sees that the results apply equally well to NFA-e's. 

We will also need the following well-known results [H] : 

Theorem 1. Let M be an NFA with n states. Then 

(a) L(M) ^ if and only if M accepts a word of length < n. 

(b) L(M) is infinite if and only if M accepts a word of length i, n < £ < 2n. 

If L C £* is a language, the Myhill-Nerode equivalence relation =l is the equivalence 
relation defined as follows: for x, y G £*, x =i y if for all z G £*, xz G L if and only if 
yz G L. The classical Myhill-Nerode theorem asserts that if L is regular, the equivalence 
relation =i has only finitely many equivalence classes. 

For a background on finite automata and regular languages we refer the reader to Yu 



3 Testing if an NFA accepts at least one palindrome 

Over a unary alphabet, every string is a palindrome, so problems (l)-(3) become trivial. Let 
us assume, then, that the alphabet £ contains at least two letters. Although the palindromes 
over such an alphabet are not regular, the language 

{x G £* : xx R G L(M) or there exists a G £ such that xax R G L(M)} 

is, in fact, regular, as is often shown in a beginning course in formal languages [Til p. 72, 
Exercise 3.4 (h)]. We can take advantage of this as follows: 

Lemma 2. Let M be an NFA with n states and t transitions. Then there exists an NFA-e 
M' with n 2 + 1 states and < 2t 2 transitions such that 

L(M') = {x G £* : xx R G L(M) or there exists a G £ such that xax R G L(M)}. 
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Proof. Let M = (Q, S, 5, q , F) be an NFA with n states. We construct an NFA-e M' = 
(Q', E, 5', q' Q , F') as follows: We let Q' = Q x Q U {q' }, where q' is the new initial state, and 
we define the set of final states by 

F' = {[p,p] '■ V £ Q} U {[P; : there exists a e E such that g G <5(jo, a)}. 
The transition function 5' is defined as follows: 

Wo><0 = {[?o,g] : geF} 

and 

^'([Pj ?]) a ) — {[ r > s ] : ?" G a) and g G 5(s, a)}. 

It is clear that M' accepts the desired language and consists of at most n 2 + 1 states and 
2t 2 transitions. □ 

Corollary 3. Given an NFA M with n states and t transitions, we can determine if M 
accepts a palindrome in 0(n 2 + t 2 ) time. 

Proof. We create M' as in the proof of Lemma El and remove all states that are not useful, 
and their associated transitions. Now M accepts at least one palindrome if and only if 
L(M') 7^ 0, which can be tested in time linear in the number of transitions and states of 
M'. □ 

From Lemma El we obtain two other interesting corollaries. 

Corollary 4. Given an NFA M' , we can determine if L(M) contains infinitely many palin- 
dromes in quadratic time. 

Proof. We create M' as in the proof of Lemma El and remove all states that are not useful, 
and their associated transitions. M accepts infinitely many palindromes if and only if L(M') 
is infinite, which can be tested in linear time, as described in Section El Q 

Corollary 5. If an NFA M accepts at least one palindrome, it accepts a palindrome of length 
< 2n 2 - 1. 

Proof. Suppose M accepts at least one palindrome. Then M', as in Lemma El accepts a 
word. Although M' has n 2 + 1 states, the only transition from the initial state q' is an 
e-transition to one of the other n 2 states. Thus if M' accepts a word, it must accept a word 
of length < n 2 — 1. Then M accepts either ww R or waw R , and both are palindromes, so M 
accepts a palindrome of length at most 2(n 2 — 1) + 1 = 2n 2 — 1. □ 

For a different proof of this corollary, see Rosaz [28J. 

We observe that the quadratic bound is tight, up to a multiplicative constant, in the case 
of alphabets with at least two letters, and even for DFAs: 

Proposition 6. For infinitely many n there exists a DFA M with n states over a 2-letter 
alphabet such that 
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(a) M has n states; 

(b) The shortest palindrome accepted by M n is of length > n 2 /2 — 3n + 5. 

Proof. For t > 2, consider the language L t = (a*) + b(a* _1 ) + . This language evidently can be 
accepted by a DFA with n = 2t + 2 states. For a word w £ L t to be a palindrome, we must 
have u> = a Cl *ba C2 ^ -1 ^, for some integers ci,C2 > 1, with c±t = c<i{t — 1). Since t and t — 1 
are relatively prime, we must have t — 1 | c\ and t | Ci- Thus the shortest palindrome in L n 
is a^-^ba*^ -15 , which is of length 2t 2 - 2t + 1 = n 2 /2 -3n + 5. □ 

4 Testing if an NFA accepts at least one non-palindrome 

In this section we consider the problem of deciding if an NFA accepts at least one non- 
palindrome. Evidently, if an NFA fails to accept a non-palindrome, it must accept nothing 
but palindromes, and so we discuss the opposite decision problem, 

Given an NFA M, is L(M) palindromic? 

Again, the problem is trivial for a unary alphabet, so we assume |S| > 2. 
Horvath, Karhumaki, and Kleijn [15] proved that the question is recursively solvable. In 
particular, they proved the following theorem: 

Theorem 7. L(M) is palindromic if and only if {x £ L(M) : \x\ < 3n} is palindromic, 
where n is the number of states of M. 

While a naive implementation of Theorem [7] would take exponential time, in this section 
we show how to test palindromicity in polynomial time. We also show the bound of 3n in 
Theorem [7] is tight for NFAs, and we improve the bound for DFAs. 

First, we show how to construct a "small" NFA M' s , for some integer s > 1, that has the 
following properties: 

(a) no word in L(M' S ) is a palindrome; 

(b) M' s accepts all non-palindromes of length < s (in addition to some other non-palindromes). 

The idea in this construction is the following: on input w of length r < s, we "guess" an 
index i, 1 < i < r/2, such that w[i] ^ w[r + 1 — i\. We then "verify" that there is indeed 
a mismatch % characters from each end. We can re-use states, as illustrated in Figure [T] for 
the case S = {a, b, c} and s = 10. 
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Figure 1: Accepting non-palindromes over {a, b, c} for s = 10. 

The resulting NFA M' s has 0(|£|s) states and 0(|S| 2 s) transitions. A similar construction 
appears in [3T] . 

Given an NFA M with n states, we now construct the cross-product with Mg n , and 
obtain an NFA A that accepts L(M) n L(M' Zn ). We claim that L(A) = if and only if 
L(M) is palindromic. For if L(A) = 0, then M accepts no non-palindrome of length < 3n, 
and so by Theorem d L(M) is palindromic. If L(A) ^ 0, then since L(M^ n ) contains only 
non-palindromes, we see that L(M) is not palindromic. 

We can determine if L(A) = efficiently by adding a new final state qf and e-transitions 
from all the final states of A to qf, then performing a depth-first search to detect whether 
there are any paths from q to qf. This can be done in time linear in the number of states 
and transitions of A. If M has n states and t transitions, then A has 0(n 2 ) states and 0(tn) 
transitions. Hence we have proved the following theorem. 

Theorem 8. Let M be an NFA with n states and t transitions. The algorithm sketched 
above determines whether M accepts a palindromic language in 0(n 2 + tn) time. 

A different method runs slightly slower, but allows us to do a little more. We can mimic 
the construction for palindromes in Section [3j but adapt it for non-palindromes. Given an 
NFA M, we construct an NFA-e M' that accepts the language 

{x £ X* : there exists x' £ £*, a £ E such that \x\ = \x'\, x ^ x' R , 
and xx' £ L(M) or xax £ L(M)}. 

The construction is similar to that in Lemma [2J On input x, we simulate M on xx' and xax' 
symbol-by- symbol, moving forward from the start state and backward from a final state. We 
need an additional boolean "flag" for each state to record whether or not we have processed 
a character in x' that would mismatch the corresponding character in x. If M has n states 
and t transitions, this construction produces an NFA-e M' with < 1 + 2n 2 states and 0(t 2 ) 
transitions. From this we get, in analogy with Corollary HI the following proposition. 

Proposition 9. Given an NFA M with n states and t transitions, we can determine in 
0(n 2 + t 2 ) time if M accepts infinitely many non-palindromes. 



8 



We now turn to the question of the optimality of the 3n bound given in Theorem [7J For 
an NFA over an alphabet of at least 2 symbols, the bound is indeed optimal, as the following 
example shows. 

Proposition 10. Let S be an alphabet of at least two symbols, containing the letters a and 
b. For n > 1 define L n = (a" _1 E)*a ri-1 . Then L n can be accepted by an NFA with n states 
and a shortest non-palindrome in L n is a n_1 aa n_1 ba n_1 . 

Proof. The details are straightforward. □ 

For DFAs, however, the bound of 3n can be improved to 3n — 3. To show this, we first 
prove the following lemma. A language L is called slender if there is a constant C such 
that, for all n > 0, the number of words of length n in L is less than C. The following 
characterization of slender regular languages has been independently rediscovered several 
times [201 EH [25]. 

Theorem 11. Let LCS* be a regular language. Then L is slender if and only if it can be 
written as a finite union of languages of the form uv*w, where u, v, w G £*. 

Next we prove the following useful lemma concerning DFAs accepting slender languages. 

Lemma 12. Let L be a slender language accepted by a DFA M with n states, over an 
alphabet of two or more symbols. Then M must have a dead state. 

Proof. Without loss of generality, assume that every state of M = (Q, E, 8, q , F) is reachable 
from q , and that S contains the symbols a and b. We distinguish two cases: 

1. M accepts a finite language. Consider the states reached from q on a, a 2 , a 3 , . . . 
Eventually some state q must be repeated. This state q must be a dead state, for if 
not, M would accept an infinite language. 

2. M accepts an infinite language. Then M has at least one fruitful cycle, that is, a cycle 
that produces infinitely many words in L(M) as labels of paths starting at go, entering 
the cycle, going around the cycle some number of times, then exiting and eventually 
reaching a final state. Let C\ be one fruitful cycle, and consider the following successful 

path involving C\. q^—^q—^q — where / 6 F and the repetition of q denotes the 
cycle Ci, labeled with u. Without loss of generality assume the first letter of u is a. 
Since M is complete, denote p — S(q,b). 

We claim that from p one cannot reach a fruitful cycle Ci. Indeed, let's assume the 
contrary; this means that there exists a successful path q -^q-^q-^r-^r^-^ f, 
with /' 6 F and the repetition of r denotes the cycle Ci labeled with v. Let n be an 
arbitrary integer, and < % < n. There exist two integers k, I such that k\u\ = l\v\ = m. 
With this notation, observe that the words au k ^ n ~^ / yv l ^ n+ ^ fj, are all accepted by M and 
have the same length 2mn + |cry/i|. Since there are n + 1 such words, this proves that 
L(M) has Q(n) words of length n for large n — a contradiction. 
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Thus, there exist a finite number of successful paths starting from p. However, consid- 
ering the states reached from p by the words a, a 2 , a 3 , . . ., one such state must repeat. 
This state is dead, for the alternative would contradict the finiteness of successful paths 
from p. 

□ 

Corollary 13. If M is a DFA over an alphabet of at least two letters and L(M) is palin- 
dromic, then M has a dead state. 

Proof. If L(M) is palindromic, then by [151 Theorem 8] it can be written as a finite union of 
languages of the form uv(tv)*u R , where u,v,t G S* and v, t are palindromes. By Theorem [TT| 
this means L(M) is slender. By Lemma [T^l M has a dead state. □ 

We are now ready to prove the improved bound of 3n — 3 for DFAs. 

Theorem 14. Let M be a DFA with n states. Then L(M) is palindromic if and only if 
{x G L(M) : \x\ < 3n — 3} is palindromic. 

Proof. One direction is clear. 

If M = (Q, X, 5, go, F) is over a unary alphabet, then L(M) is always palindromic, so the 
criterion is trivially true. 

Otherwise M is over an alphabet of at least two letters. Assume {x G L(M) : \x\ < 
3n — 3} is palindromic. From Corollary CDS, we see that M must have a dead state. But then 
we can delete such a dead state and all associated transitions, and all states reachable from 
the deleted dead state, to get a new NFA M' with at most n — 1 states that accepts the same 
language. We know from Theorem [7] that the palindromicity of {x G L(M') : \x\ < 3n — 3} 
implies that M' is palindromic. □ 

Finally, we observe that 3n — 3 is the best possible bound in the case of DFAs. To do 
so, we simply use the language L n from Proposition [TO] and observe it can be accepted by a 
DFA with n + 1 states; yet the shortest non-palindrome is of size 3n — 1 . 

We end this section by noting that the related, but fundamentally different, problem of 
testing if L — L R was shown by Hunt [TB] to be PSPACE-complete. 

5 Testing if an NFA accepts a word matching a pattern 

In this section we consider the computational complexity of testing if an NFA accepts a word 
matching a given pattern. Specifically, we consider the following decision problem. 

NFA PATTERN ACCEPTANCE 

INSTANCE: An NFA M over the alphabet £ and a pattern p over some alphabet 
A. 

QUESTION: Does there exist x G S + such that x G L(M) and x matches pi 
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Since the pattern p is given as part of the input, this problem is actually somewhat more 
general than the sort of problem formulated as Question 1 of the introduction, where the 
language L was fixed. 

We first consider the following result of Restivo and Salemi [27J (a more detailed proof 
appears in [1]). We give here a boolean matrix based proof (see Zhang [31] for a study of 
this boolean matrix approach to automata theory) that illustrates our general approach to 
the other problems treated in this section. 

Theorem 15 (Restivo and Salemi). Let L be a regular language and let A be an alphabet. 
The set P& of all non-empty patterns p G A* such that p matches a word in L is effectively 
regular. 

Proof. Let M = (Q, E, 5, q , F) be an NFA such that L(M) = L. Suppose that Q = 
{0, 1, ... ,n — 1}. For a G S, let B a be the n x n boolean matrix whose (i, j) entry is 1 if 
j G 5{i, a) and otherwise. Let B denote the semigroup generated by the B^s along with 
the identity matrix. For w = WqWi ■ ■ -w s , where Wi G £ for i = 0, . . . , s, we write B w to 
denote the matrix product B WQ B Wl ■ ■ ■ B Ws . 

Without loss of generality, let A = {1,2, ... ,k}. Observe that there exists a non-empty 
pattern p = p pi ■ ■ ■ p r , where pi G A for i = 0, . . . ,r, and a non-erasing morphism h : A* — > 
X* such that h(p) G L if and only if there exist k boolean matrices B x , . . . , B^ G B such that 
Bi = B h uj for i G A and B = B po B pi ■ ■ ■ B Pr describes an accepting computation of M. 

We construct an NFA M' = (Q', A, 5', P, F') for as follows. For simplicity, we permit 
M' to have multiple initial states, as specified by the set P. We define Q' = B k+1 . The set 
P of initial states is given by P = B k x /, where / denotes the identity matrix. In other 
words, the NFA M' uses the first k components of its state to record an initial guess of k 
boolean matrices B 1: . . . , B k e B. Let [Bi, . . . , Bj., A] denote some arbitrary state of M'. For 
i = 1, . . . , k, the transition function 5' maps [Bi, . . . , Bk, A] to [B\, . . . , Bk, ABi}. In other 
words, on input p = popi ■ ■ -p r G A*, M' uses the last component of its state to compute 
the product B = B po B pi ■ ■ ■ B Pr . The set F 1 of final states of M' consists of all states of the 
form [Bi, . . . , Bk, B], where the matrix B contains a 1 in some entry (0, j), where j G F. In 
other words, M' accepts if and only if B describes an accepting computation of M. □ 

By consider unary patterns of the form a k , we obtain the following corollary of Theo- 
rem [T5J 

Corollary 16. Let L C S* be a regular language. The set of exponents k such that L 
contains a k-power is the union of a finite set with a finite union of arithmetic progressions. 
Further, this set of exponents is effectively computable. 

Observe that Theorem [H] implies the decidability of the NFA PATTERN ACCEP- 
TANCE problem. We prove the following stronger result. 

Theorem 17. The NFA PATTERN ACCEPTANCE problem is P SPACE- complete. 
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Proof. We first show that the problem is in PSPACE. By Savitch's theorem [29] it suffices 
to give an NPSPACE algorithm. Let M = (Q,E,5,q ,F), where Q = {0,1, ...,n- 1}. 
For a G S, let B a be the n x n boolean matrix whose (i, j) entry is 1 if j G <5(i,a) and 
otherwise. Let B denote the semigroup generated by the _B a 's along with the identity matrix. 
For w = woWi ■ ■ ■ w s G £*, we write B w to denote the matrix product B Wo B Wl ■ ■ ■ B Wg . 

Let A be the set of letters occuring in p. We may suppose that A = {1, 2, ... , k}. First, 
we non-deterministically guess k boolean matrices Bi, . . . ,B k . Next, for each i, we verify 
that Bi is in the semigroup B by non-deterministically guessing a word w = WqWi ■ ■ ■ w s such 

2 

that Bi = B w . Since there are at most 2™ possible n x n boolean matrices, we may assume 

2 

that s < 2 n . We thus guess w symbol-by- symbol and compute a sequence of matrices 
reusing space after perfoming each matrix multiplication. We maintain an 0(n 2 ) bit counter 

2 

to keep track of the length s of our guessed word w. If s exceeds 2 n , we reject on this branch 
of the non-deterministic computation. 

Finally, if p = pop\ ■ ■ ■ p r , we compute the matrix product B = B Po B Pl ■ ■ ■ B Pr and accept 
if and only if B describes an accepting computation of M. 

To show hardness we reduce from the following PSPACE-complete problem [101 Problem 
AL6]. 

DFA INTERSECTION 

INSTANCE: An integer k > 1 and k DFAs A x , A 2 , . . . , A k , each over the alphabet 
S. 

QUESTION: Does there exist x G S* such that x is accepted by each 1 < 
% < k? □ 

Let # be a symbol not in S. We construct, in linear time, a DFA M to accept the 
language L{A\) #L(A2) # ■ • • L(Afc) Any word in L(M) matching the pattern a k is of 
the form (xjf) k . It follows that M accepts a word matching a k if and only if there exists x 
such that x G L{Ai) for 1 < i < k. This completes the reduction. □ 

We may define various variations or special cases of the NFA PATTERN ACCEP- 
TANCE problem, such as: NFA ACCEPTS A £>POWER, NFA ACCEPTS A > k- 
POWER, NFA ACCEPTS INFINITELY MANY A>POWERS, NFA ACCEPTS 
INFINITELY MANY > fc-POWERS, etc. We define and consider the computational 
complexity of these variations below. 

NFA ACCEPTS A A>POWER. 

INSTANCE: An NFA M over the alphabet E and an integer k > 2. 
QUESTION: Does there exist x G S + such that M accepts x k l 

NFA ACCEPTS A > fc-POWER. 

INSTANCE: An NFA M over the alphabet E. 

QUESTION: Does there exist x G E + and an integer I > k such that M accepts 

x e 7 
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The NFA ACCEPTS A > £>POWER problem is actually an infinite family of prob- 
lems, each indexed by an integer k > 2. If k is fixed, the NFA ACCEPTS A £>POWER 
problem can be solved in polynomial time, as we now demonstrate. 

Proposition 18. Let M be an NFA with n states and t transitions, and set N = n+t, the size 
of M. For any fixed integer k > 2, there is an algorithm running in 0(n 2k ~ l t k ) = 0(N 2k ~ l ) 
time to determine if M accepts a k-power. 

Proof. For a language L C £*, we define 

L l ' k = {x G £* : x k G L}. 

Let M = (Q, £, 5, q , F) be an NFA with n states. We will construct an NFA-e M' such 
that L(M') = L(M) 1 / fe . To determine whether or not M accepts a A;-power, it suffices to 
check whether or not M' accepts a non-empty word. 

The idea behind the construction of M' is as follows. On input x, M' first guesses k — 1 
states <7i, g 2 , ■ ■ ■ , gt-i £ Q and then checks that 

• 9i e S(q ,x), 

• g i+ i E S(g h x) for i = 1,2, . . . , k - 2, and 

• 5(g k - 1 ,x)r)F^$. 

It is clear that such states g±, g 2 , ■ ■ ■ , <7fc-i exist if and only if x k G L(M). 

Formally, the construction of M' is as follows. We define the NFA M' = (Q', E, 5', q' , F') 
such that: 

• Q' — Wo} u Q 2 ^ 1 - That is, except for q' , each state of M' is a (2/c — l)-tuple of the 
form [gi,g 2 , ■ ■ ■ , 9k-i,Po,Pi, ■ ■ ■ ,Pk-i\- The state gi represents the i-th state guessed 
from M. The NFA M' will simulate in parallel the computations of M on input x 
starting from states q , g±, g 2 , . . . , g^-i respectively. The state p represents the current 
state of the simulation beginning from state qo, and the states pi,p 2 , ■ ■ ■ ,Pk-i represent 
the current states of the simulations beginning from states g±, g 2 , . . . , gk-i, respectively. 

• q' is an additional state not in Q 2k ~ x . This state will have outgoing e-transitions for 
each different combination of guesses gi. The transition function on the start state is 
defined as 

5 '(?o> e ) = {[9i,92,---,9k-i,qo,9i,92,---,gk-i] : Vi G {1,2,. . ., k - l},g { G Q}. 

• We define the transition function 5' on all other states as: 

8'([gi,g2, ■ ■ . ,g k -i,Po,Pi, ■ ■ ■ ,p k -i],a) = 

{[9i,92,---,9k-i,Po,P'i,-- ■ ,Pk-i\ : Vi G {0,1,. . ., /c - l},p- G 5(pi,a)} 

for all a G E. 
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• F' — {[gi, (7 2 , ... , gk-i, g%, #2, ■ ■ ■ > 9k-i, t] : t E F}. That is, we reach a state in F' on 
input x exactly when the guessed states verify the conditions described above. 

It should be clear from the construction that M' accepts L(M) 1 ' k . The number of states 
in M' is n 2A:_1 + 1, as, except for q' , each state is a (2k — l)-tuple in which each coordinate 
can take on \Q\ = n possible values. For each state there are at most t k distinct transitions. 
Testing whether or not L(M') accepts a non-empty word can be done in linear time (since 
the only e-transitions are transitions outgoing from q' ), so the running time of our algorithm 
is 0(n 2k ~H k ). " □ 

As before, we can use the same automaton to test if infinitely many /c-powers are accepted. 

Corollary 19. We can decide if an NFA M with n states and t transitions accepts infinitely 
many k-powers in 0(n 2k ^ 1 t k ) time. 

If k is not fixed, we have the following result, which is an immediate consequence of 
Theorem [T7] if k is given in unary. However, the problem remains in PSPACE even if k is 
given in binary, as we now demonstrate. 

Theorem 20. The problem NFA ACCEPTS A fc-POWER is PS PACE- complete. 

Proof. We first show that the problem is in PSPACE. By Savitch's theorem [29] it suffices 
to give an NPSPACE algorithm. Let M = (Q, E, 5, q , F), where Q = {0, 1, . . . , n - 1}. For 
a E S, let B a be the nxn boolean matrix whose (i,j) entry is 1 if j E 5(i, a) and otherwise. 
Let B denote the semigroup generated by the B a 's. 

We non-deterministically guess a boolean matrix B and verify that B E B (i.e., B = B x 
for some x E £*), as illustrated in the proof of Theorem [T71 Finally, we compute B k 
efficiently by repeated squaring and verify that B k contains a 1 in position (q , f) for some 

feF. 

The proof for PSPACE-hardness is precisely that given in the proof of Theorem [T71 □ 

Theorem 21. For each integer k>2, the problem NFA ACCEPTS A > fc-POWER is 
PS PACE- complete. 

Proof. To show that the problem is in PSPACE, we use the same algorithm as in the proof of 
Theorem [20J, with the following modification. In order to verify that M accepts an £-power 
for some t > k, we first observe that by the same argument as in the proof of Proposition 1431 
below, if M accepts such an £-power, then M accepts an £-power for k < £ < k + n. Thus, 
after non-deterministically computing B x , we must compute B l x for all k < £ < k + n, and 
verify that at least one B l x contains a 1 in position (q , f) for some f E F. 

To show PSPACE-hardness, we again reduce from the DFA INTERSECTION prob- 
lem. Suppose that we are given r DEAs Ai, A2, ■ ■ ■ , A r and we wish to determine if the A^s 
accept a common word x. We may suppose that r > k, since for any fixed k such a restriction 
does not affect the PSPACE-completeness of the DFA INTERSECTION problem. Let j 
be the smallest non-negative integer such that r + j is prime. By Bertrand's Postulate [121 
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Theorem 418], we may take j < r. We now construct, in linear time, a DFA M to accept 
the language L(A X ) # L(A 2 ) # ■ ■ ■ L(A r ) #(E* The DFA M accepts a > A;-power if and 
only if it accepts an (r+j)-power. Moreover, M accepts an (r+j)-power if and only if there 
exists x such that x G L(Ai) for 1 < i < r. This completes the reduction. □ 

In a similar fashion, we now show that the following decision problems are PSPACE- 
complete: 

NFA ACCEPTS INFINITELY MANY fc-POWERS. 

INSTANCE: An NFA M over the alphabet E and an integer k > 2. 
QUESTION: Does M accept x k for infinitely many words x? 

NFA ACCEPTS INFINITELY MANY > fc-POWERS. 

INSTANCE: An NFA M over the alphabet E. 

QUESTION: Are there infinitely many pairs (x, i) such that i > k and M accepts 

x*7 

Again, the NFA ACCEPTS INFINITELY MANY > )t-POWERS problem is 
actually an infinite family of problems, each indexed by an integer k > 2. We will prove that 
these decision problems are PSPACE-complete by reducing from the following problem. 

INFINITE CARDINALITY DFA INTERSECTION. 

INSTANCE: An integer k > 1 and k DFAs A u A 2 , . . . , A k , each over the alphabet 
E. 

QUESTION: Do there exist infinitely many x G E* such that x is accepted by 
each Ai, 1 < % < kl 

Lemma 22. The decision problem INFINITE CARDINALITY DFA INTERSEC- 
TION is PSPACE-complete. 

Proof. First, let's see that the problem is in PSPACE. If the largest DFA has n states, then 
there is a DFA with at most n k states that accepts f) 1<j<jfe L(Ai). Now from Theorem CD (b), 
we know that there exist infinitely many x accepted by each A4 if and only if there is a word 
x length £, n k < £ < 2n k , accepted by all the Aj. We can simply guess the symbols of x, 
ensuring with a counter that n k < \x\ < 2n k , and checking by simulation that x is accepted 
by all the A^. The counter uses at most klogn + log 2 bits, which is polynomial in the size 
of the input. This shows the problem is in nondeterministic polynomial space, and hence, 
by Savitch's theorem j2S], in PSPACE. 

Now, to see that INFINITE CARDINALITY DFA INTERSECTION is PSPACE- 
hard, we reduce from DFA INTERSECTION. For each DFA A { = (Qi,E,6 h q 0j i, we 
modify it to Bi as follows: we add a new initial state q' 0i , and add the same transitions from 
it as from q®^. We then change all final states to non- final, and we make q' i final. We add a 
transition from all states that were previously final on a new letter ^ (the same letter is used 
for each Ai), and a transition from all other states on ^ to a new dead state d. Finally, we 
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add transitions on all letters from d to itself. We claim B^ is a DFA and L(B,j) = (L(Aj)<Js)*. 
Furthermore, f] 1<i<k L(Ai) ^ if and only if f] l<i<k L(Bi) is infinite. 

Suppose f] 1<i<k L(Ai) ^ 0. Then there exists x accepted by each of the A4. Then (x<$)* 
is accepted by each of the Bi, so f] 1<i<k L(Bj) is infinite. 

Now suppose fli<i<fc L(Bi) is infinite. Choose any nonempty x G PIkik/c ^(-^*) = 
ni<i<jfe(-^(-^i) *!')*• Thus x must be of the form yi^y 2 ^- --y^ for some j > 1, where each 
|/j is accepted by all the A4. Hence, in particular, y 1 is accepted by all the A i: and so 

ni<K fe ^(A)^0. □ 

We are now ready to prove 

Theorem 23. The decision problem NFA ACCEPTS INFINITELY MANY fc-POWERS 

is F 'SPACE- complete. 



Proof. First, let's see that the problem is in PSPACE. We claim that an NFA M with n states 
accepts infinitely many fc-powers if and only if it accepts a /c-power x k with 2 n < \x\ < 2 n +1 . 

One direction is clear. For the other direction, we use boolean matrices, as in the proof 
of Theorem 1201 We can construct a DFA M' = (Q' ',11,6' ,q' , F') of 2" states that accepts 
ji/k = jx E S* : x k G L(M)}, as follows: the states are n x n boolean matrices. The 
initial state q' is the identity matrix. If B a is the boolean matrix with a 1 in entry 
if j G 5(qi,a) and otherwise, then 5'(B,a) = BB a . The set of final states is F' = {B : 
the (0, j) entry of B k is 1 for some qj G F}. 

The idea of this construction is that if x = a±a2 ■ ■ ■ di, then S(q' , x) = B ai ■ ■ ■ B ai . Now 
we use Theorem [1] (b) to conclude that M' accepts infinitely many words if and only if it 
accepts a word x with 2" 2 < |x| < 2™ 2+1 . But L(M') = L{Mf' k . 

Thus, to check if M accepts infinitely many fc-powers, we simply guess the symbols of 
x, stopping when 2 n < \x\ < 2 n +1 , and verify that M accepts x k . We can do this by 
accumulating B ai ■ ■ ■ B ak and raising the result to the k-th power, as before. We need n 2 + 1 
bits to keep track of the counter, so the result is in NPSPACE, and hence in PSPACE. 

Now we argue that NFA ACCEPTS INFINITELY MANY fc-POWERS is PSPACE- 
hard. To do so, we reduce from INFINITE CARDINALITY DFA INTERSECTION. 
Given DFAs Ai, A 2 , . . . , Ak, we can easily construct a DFA A to accept L(Ai)# ■ ■ ■ L(Ak)#. 
Clearly A accepts infinitely many fc-powers if and only if Hkik/c L(Ai) is infinite. □ 

Theorem 24. For each integer k > 2, the problem NFA ACCEPTS INFINITELY 
MANY > fc-POWERS is PSPACE- complete. 

Proof. Left to the reader. □ 



6 Testing if an NFA accepts a non-/c-power 

In the previous section we showed that it is computationally hard to test if an NFA accepts 
a fc-power (when k is not fixed). In this section we show how to test if an NFA accepts a 
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non-fc-power. Again, we find it more congenial to discuss the opposite problem, which is 
whether an NFA accepts nothing but /c-powers. 

First, we need several classical results from the theory of combinatorics on words. The 
following theorem is due to Lyndon and Schiitzenberger [21J. 

Theorem 25. If x, y, and z are words satisfying an equation x l yi = z k , where i,j,k > 2, 
then they are all powers of a common word. 

The next result is also due to Lyndon and Schiitzenberger. 

Theorem 26. Let u and v be non-empty words. If uv = vu, then there exists a word x and 
integers i,j > 1, such that u = x % and v = xK In other words, u and v are powers of a 
common word. 

The following result can be derived from Theorem |2T)1 

Corollary 27. Let u and v be non-empty words. If u r = v s for some r, s > 1, then u and v 
are powers of a common word. 

Ito, Katsura, Shyr, and Yu [T7| gave a proof of the next proposition. 

Proposition 28. Let u and v be non-empty words. If u and v are not powers of a common 
word, then for any integers r, s > 1, r ^ s, at least one of u r v or u s v is primitive. 

The next result is due to Shyr and Yu [32] . 

Theorem 29. Let p and q be primitive words, p ^ q. The set p + q + contains at most one 
non-primitive word. 

Next we prove the following analogue of Theorem from which we will derive an efficient 
algorithm for testing if a finite automaton accepts only fc-powers. 

Theorem 30. Let L be accepted by an n-state NFA M and let k > 2 be an integer. 

1. Every word in L is a k-power if and only if every word in the set {x G L : \x\ < 3n} 
is a k-power. 

2. All but finitely many words in L are k-powers if and only if every word in the set 
{x G L : n < \x\ < 3n} is a k-power. 

Further, if M is a DFA over an alphabet of size > 2, then the bound 3n may be replaced by 
3n — 3. 

Ito, Katsura, Shyr, and Yu [T7] proved a similar result for primitive words: namely, that 
if L is accepted by an n-state DFA over an alphabet of two or more letters and contains 
a primitive word, then it contains a primitive word of length < 3n — 3. In other words, 
every word in L is a power if and only if every word in the set {x G L : \x\ < 3n — 3} is a 
power. However, this result does not imply Theorem [30l as one can easily construct a regular 
language L where every word in L that is not a fc-power is nevertheless non-primitive: for 
example, L = {a k+1 }. 

We shall use the next result to characterize those regular languages consisting only of 
/c-powers. 
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Proposition 31. Let u, v, and w be words, o / e, uw ^ e, and let f, g > 1 be integers, 
f 7^ g. If uv^w and uv 9 w are non-primitive, then uv n w is non-primitive for all integers 
n > 1. Further, ifuvw and uv 2 w are k-powers for some integer k > 2, then v and uv n w are 
k-powers for all integers n > 1. 

Proof. Suppose uv^w and uv 9 w are non-primitive. Then v'vm and v g wu are non-primitive. 
Let x and y be the primitive roots of v and wu, respectively, so that v — x 1 and wu = yi for 
some integers i,j > 1. If x ^ y, then by Proposition [28| one concludes that at least one of 
v^wu or v 9 wu is primitive, a contradiction. 

If x — y, then for all integers n > 1, v n wu = x ni+J is clearly non-primitive, and con- 
sequently, uv n w is non-primitive, as required. Let us now suppose that uvw and uv 2 w are 
/c-powers for some k > 2. Then www = x l+ i and w 2 to = x 2 *"^ are both /c-powers as well. 
We claim that the following must hold: 

i + j = (mod k) 
2i+j = (mod k). 

To see this, write vwu = z k for some word z. Then z k = x %+ \ so by Corollary [27] z and 
x are powers of a common word. Since x is primitive it follows that z is a power of x. In 
particular, \x\ divides \z\ and z+j is a multiple of fc, as claimed. A similar argument applies 
to v 2 wu. 

We conclude that % = j = (mod fc), and hence, w = x l is a fc-power. Moreover, 
f n ww = x m+J is also a fc-power for all integers n > 1, and consequently, Mf n w is a /c-power, 
as required. □ 

The characterization due to Ito et al. [TTJ, Proposition 10] (see also Domosi, Horvath, 
and Ito [7J, Theorem 3]) of the regular languages consisting only of powers, along with 
Theorem [HI implies that any such language is slender. A simple application of the Myhill- 
Nerode Theorem gives the following weaker result. 

Proposition 32. Let L be a regular language and let k > 2 be an integer. If all but finitely 
many words of L are k-powers, then L is slender. In particular, if L is accepted by an n-state 
DFA and all words in L of length > i are k-powers, then for all r > i, the number of words 
in L of length r is at most n. 

Proof. Let x k and y k be distinct words in L of length r > t. Then x and y are inequivalent 
with respect to the Myhill-Nerode equivalence relation, since y k G L but xy k ~ x (jL L. The 
Myhill-Nerode equivalence relation on L thus has index at least as large as the number of 
distinct words of length r in L. Since the index of the Myhill-Nerode relation is at most n, 
it follows that there is a bounded number of words of length r in L, so that L is slender, as 
required. □ 

The following characterization is analogous to the characterization of palindromic regular 
languages given in [T5l Theorem 8]. 
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Theorem 33. Let LCS* be a regular language and let k > 2 be an integer. The language 
L consists only of k-powers if and only if it can be written as a finite union of languages of 
the form uv*w, where u,v,w G X* satisfy the following: there exists a primitive word x G S* 
and integers i,j > such that v = x lk and wu = x^ k . 

Proof. The "if" direction is clear; we prove the "only if" direction. Let L consist only of 
/c-powers. Then by Proposition [32J, L is slender. By Theorem [TTJ, L can be written as a 
finite union of languages of the form uv*w. By examining the proof of Proposition [311 one 
concludes that u, v, and w have the desired properties. □ 

We shall need the following lemma for the proof of Theorem [301 

Lemma 34. Let L be a regular language accepted by an n-state NFA M and let k > 2 be 
an integer. If L contains a non-k-power of length > n, then L contains infinitely many 
non-k-powers. 

Proof. Let s G L be a non-/c-power such that \s\ > n. Consider an accepting computation of 
M on s. Such a computation must contain at least one repeated state. It follows that there 
exists a decomposition s = uvw, v ^ e, such that uv *w C L. Let x be the primitive root of 
v , so that v = x % for some positive integer i. 

Suppose that wu = e. Since s = v = x % is not a fc-power, it follows that i ^ (mod k). 
Moreover, there exist infinitely many positive integers I such that li ^ (mod k), and so by 
Corollary [271 there exist infinitely many words of the form v l = x that are non-fc-powers 
in L, as required. 

Suppose then that wu ^ e. Let y be the primitive root of wu, so that wu = yi for some 
positive integer j. We have two cases. 

Case 1: x = y. Since uvw is a not a A;-power, vwu is also not a fc-power, and thus we 
have i + j ^ (mod k) . Moreover, there are infinitely many positive integers I such that 
1% + j ^ (mod k) . For all such £, the word v l wu = x h+ i is not a /c-power, and hence 
the word uv l w is a non-/c-power in L. We thus have infinitely many non-A;-powers in L, as 
required. 

Case 2: x ^ y. By Theorem [291 "*iO!j contains infinitely many primitive words. Thus, 
uv *w contains infinitely many non-/c-powers, as required. □ 

We are now ready to prove Theorem [301 

Proof of Theorem [3(Ji The proof is similar to that of [TTl Proposition 7]. It suffices to 
prove statement (2) of the theorem, since statement (1) follows immediately from (2) and 
Lemma [3H 

Suppose that L contains infinitely many non-fc-powers. Then L contains a non-/c-power 
s with |s| > n. Suppose, contrary to statement (2), that a shortest such s has |s| > 3n. 
Then any computation of M on s must repeat some state at least 4 times. It follows that 
there exists a decomposition s = UV1V2V3W, t>i,t>2,t>3 7^ e, such that uv^v^v^w C L. We may 
assume further that |fif2f3| < 3n, so that wu 7^ e. 
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Let pi, p 2 , P3, and q be the primitive roots of v\, v 2 , v 3 , and wu, respectively. Let V\ = p 1 ^, 
v 2 = P 2 ! > ^3 = P3 '1 an d wu — Q° ') f° r some integers zi,Z2,«3, j > 0. We consider three cases. 

Case 1: P\ = p 2 = Pz = q. Without loss of generality, suppose that |t>i| < \v 2 \ < |^3 1 . 
Since |s| > 3n, we must have |w3iu| > n, and thus |wii>3iu| > n and |w2i>3iu| > n. By 
assumption, the words V3WU = q l3+ ^ v\V^wu = q^+^+i^ an d v 2 v%wu = q l2+l3+ j are /c-powers, 
whereas the word V\V 2 v^wu = g*i+ J 2+ J 3+i j s no t. Applying Corollary [271 we deduce that the 
following system of equations 

%x + i 2 + 13 + j ^ (mod k) 
is + j = (mod k) 
H + ^3 + j = (mod k) 
«2 + «3 + j = (mod fc) 

must be satisfied. However, it is easy to see that this is impossible. 

Case 2: p\ 7^ q and p 2 = P3 = q- If < then let £ be the smallest positive integer 

such that n < |ffwu| < |wf +1 wu| < \s\. Then by Proposition [28| one of the words vfwu or 
wf +1 wu is primitive. Hence, at least one of the words uv[w or uv[ +1 w is a primitive word in 
L, contradicting the minimality of s. 

If, instead, > n, then we have n < \v\wu\ < \v\V 2 wu\ < \s\. Again, by Proposi- 

tion one of the words v\wu or viv 2 wu is primitive. Hence, at least one of the words uv\W 
or uv\V 2 w is a primitive word in L, contradicting the minimality of s. 

Case 3: p\ 7^ q and p 2 7^ q. In this case we choose the smaller of v\ and v 2 to "pump", 
so without loss of generality, suppose \v\\ < \v 2 \. Let I be the smallest positive integer such 
that n < \v[wu\ < \v{ +1 wu\ < \s\. Note that |t>^ww| < \viv 2 wu\ < \s\, so such an £ must 
exist. Then by Proposition |2"51 one of the words v[wu or v[ +1 wu is primitive. Hence, at least 
one of the words uv\w or uv 1 +1 w is a primitive word in L, contradicting the minimality of s. 

All remaining possibilities are symmetric to the cases considered above. Since in all 
cases we derive a contradiction, it follows that if L contains infinitely many non- /c-powers, 
it contains a non-fc-power s, where n < \s\ < 3n. 

It remains to consider the situation where M is a DFA over an alphabet of size > 2. Let 
a 7^ b be alphabet symbols of M. If M does not have a dead state, then for every integer 
i > n — 1, there exists a word x, \x\ < n — 1, such that a*6x G L. These words a*6x are all 
distinct and primitive. Thus, whenever M has no dead state, M always accepts infinitely 
many non-/c-powers, and, in particular, M accepts a non-/c-power s, where n < \s\ < 2n — 1. 

If, on the other hand, M does have a dead state, then we may delete this dead state and 
apply the earlier argument with the bound 3n — 3 in place of 3n. 

Finally, the converse of statement (2) follows immediately from Lemma [341 □ 

We can now deduce the following algorithmic result. 

Theorem 35. Let k > 2 be an integer. Given an NFA M with n states and t transitions, it 
is possible to determine if every word in L(M) is a k-power in 0(n 3 + tn 2 ) time. 
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Proof. The proof is exactly analogous to that of Theorem [HI and we only indicate what needs 
to be changed. Suppose M has t states. We create an NFA, M' r , for r = 3t, such that no 
word in L(M' r ) is a £;-power, and M' r accepts all non-fc-powers of length < r (and perhaps 
some other non-/c-powers). 

Note that we may assume that k < r. If k > r, then no word of length < r is a A;-power. 
In this case, to obtain the desired answer it suffices to test if the set {x 6 L(M) : \x\ < r} 
is empty. However, this set is empty if and only if L(M) is empty, and this is easily verified 
in linear time. 

We now form a new NFA A as the cross product of M' r with M. From Theorem [30j it 
follows that L(A) = iff every word in L(M) is a fc-power. We can determine if L(A) = 
by checking (using depth-first search) whether any final states of A are reachable from the 
start state. 

It remains to see how M' r is constructed. If the length of a word x accepted by M' r is a 
multiple of k, x can be partitioned into k sections of equal length. In order for M' r to accept 
x, the NFA must 'verify' a symbol mismatch between two symbols found in different sections 
but in the same position. 

If a: is a non-fc-power, then a symbol mismatch will occur between two sections of x, call 
them Si and Sj. This means that and Sj differ in at least one position. Comparing Sj and 
Sj to si, the first section of x, we notice that at least one of Sj or sj must have a symbol 
mismatch with si (otherwise si = Sj = Sj, which would give a contradiction). Therefore, 
when checking x for a symbol mismatch, it is sufficient to only check s\ against each of the 
remaining k — 1 sections, as opposed to checking all ( 2 ) possibilities. 

In order to construct M' r , we create a series of 'lobes', each of which is connected to the 
start state by an e-transition. Each lobe represents three simultaneous 'guesses' made by 
the NFA, which are: 

• Which alphabet symbols will conflict and in which order. The number of possible 
conflict pairs is |E| (|E| — 1). 

• The section in which there will be a symbol mismatch with the first section. There are 
k — 1 possible sections. 

• The position in which the conflict will occur. In the worst case when the length of the 
input is r, there will be at most r/k possible positions. 

This gives a total of at most |S| (|E| — 1) ■ (k — 1) - r/k lobes. The construction of each 
lobe is illustrated in Figure EJ 

Each lobe contains at most r + 1 states. In addition to these lobes, we also require a 
fc-state submachine to accept all words whose lengths are not a multiple of k. 

In total, M' r has at most 

|E| (|E| - 1) • (A; - 1) • j ■ (r + 1) + k + 1 G 0(r 2 ) 
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Figure 2: One lobe of the NFA for k = 3, r = 12 and 0, 1 conflicting symbols. 



states (since k < r), and similarly, 0(r 2 ) transitions. After constructing the cross-product, 
this gives a 0(n 3 + tn 2 ) bound on the time required to determine if every word in L(M) is 
a fc-power. □ 

Theorem [30] suggests the following question: if M is an NFA with n states that accepts 
at least one non-/c-power, how long can a shortest non-/c-power be? Theorem [30] proves 
an upper bound of 3n. A lower bound of 2n — 1 for infinitely many n follows easily from 
the obvious (n + l)-state NFA accepting a n (a n+1 )*, where n is divisible by k. However, 
Ito, Katsura, Shyr, and Yu [T7] gave a very interesting example that improves this lower 
bound: if x = ((ab) n a) 2 and y = baxab, then x and xyx are squares, but xyxyx is not a 
power. Hence, the obvious (8n + 8)-state NFA that accepts x(yx)* has the property that 
the shortest non-/c-power accepted is of length 20n + 18. This improves the lower bound for 
infinitely many n. 

We now generalize their lower bound. 

Proposition 36. Let k > 2 be fixed. There exist infinitely many NFAs M with the property 
that if M has r states, then the shortest non-k-power accepted is of length (2 + ^z^) r—0(l). 

Proof. Let u = (ab) n a, x = u k , and y = x~ l (xbau~ 1 x) k x~ 1 . Thus xyx = (xbau~ 1 x) k . Hence 
x and xyx are both fc-powers. 

However, xyxyx is not a fc-power. To see this, assume it is, and write xyxyx = g\gi ■ ■ ■ gt- 
Look at the character in position 2kn — 2n + k (indexing beginning with 1) in g\ and g^. In 
gi it is a, and in it is b, so xyxyx is not a fc-power. 

We can accept x(yx)* with an NFA using \xy\ states. The shortest non-/c-power is xyxyx, 
which is of length m. 

We have \u\ = 2n+l, \x\ = fe(2n+l), \y\ = k(4kn — 6n+2k — l), r = \xy\ = 2k(2kn — 2n + 
k), andm = \xyxyx\ = k(8kn-6n+Ak+l). Thus m = ^E^r--^ = (2 + r-O(l). □ 

Next, we apply part (2) of Theorem [30]to obtain an algorithm to check if an NFA accepts 
infinitely many non-fc-powers. 

Theorem 37. Let k > 2 be an integer. Given an NFA M with n states and t transitions, it 
is possible to determine if all but finitely many words in L(M) are k-powers in 0(n 3 + tn 2 ) 
time. 
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Proof. The proof is similar to that of Theorem [351 The only difference is that in view 
of part (2) of Theorem [30] we instead construct M' r to accept all non-/c-powers s, where 
n < \s\ < 3n. We leave the details to the reader. □ 



7 Automata accepting only powers 

In this section we move from the problem of testing if an automaton accepts only fc-powers 
to the problem of testing if it accepts only powers (of any kind). Just as Theorem [30] was the 
starting point for our algorithmic results in Section [6], the following theorem of Ito, Katsura, 
Shyr, and Yu p2] is the starting point for our algorithmic results in this section. We state 
the theorem in a stronger form than was originally presented by Ito et al. 

Theorem 38. Let L be accepted by an n-state NFA M. 

1. Every word in L is a power if and only if every word in the set {x G L : \x\ < 3n} is 
a power. 

2. All but finitely many words in L are powers if and only if every word in the set {x G 
L : n < \x\ < 3n} is a power. 

Further, if M is a DFA over an alphabet of size > 2, then the bound 3n may be replaced by 
3n — 3. 

We next prove an analogue of Proposition [32j We need the following result, first proved 
by Birget [3j . and later, independently, in a weaker form, by Glaister and Shallit [TTJ . 

Theorem 39. Let LCS* be a regular language. Suppose there exists a set of pairs 

S = {( Xi , Vi ) G S* x S* : 1 < i < n} 

such that 

• "^iVi G L for 1 < i < n, and 

• either Xiyj ^ L or Xjyi ^ L for 1 < i, j < n, i ^ j . 
Then any NFA accepting L has at least n states. 

Proposition 40. Let M be an n-state NFA and let I be a non-negative integer such that 
every word in L(M) of length > £ is a power. For all r > £, the number of words in L(M) 
of length r is at most 7n. 

Proof. Let r > i be an arbitrary integer. The proof consists of three steps. 

Step 1. We consider the set A of words w in L(M) such that \w\ = r and w is a fc-power 
for some k > 4. For each such w, write w = x l , where x is a primitive word, and define a 
pair (x 2 ,x*™ 2 ). Let Sa denote the set of such pairs. Consider two pairs in Sa- (x 2 ,x l ~ 2 ) and 
{y 2 i U'' ' 2 )- The word x 2 y^~ 2 is primitive by Theorem [251 and hence is not in L(M). The set 
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Sa thus satifies the conditions of Theorem [391 Since L(M) is accepted by an n-state NFA, 
we must have \Sa\ <n and thus \A\ < n. 

Step 2. Next we consider the set B of cubes of length r in L(M). For each such cube 
w = x 3 , we define a pair (x,x 2 ). Let Sb denote the set of such pairs. Consider two pairs 
in Sb- (x,x 2 ) and (y,y 2 ). Suppose that xy 2 and yx 2 are both in L(M). The word xy 2 is 
certainly not a cube; we claim that it cannot be a square. Suppose it were. Then \x\ and 
\y\ are even, so we can write X — X\X% and y = y ± y 2 where \xi\ = \x 2 \ = \yi\ = 1 2/2 1 - Now if 
xy 2 = x x x 2 yxy 2 yxy 2 is a square, then x x x 2 y x = 2/22/12/2, and so y 1 = y 2 . Thus y is a square; 
write y = z 2 . By Theorem [25| yx 2 = z 2 x 2 is primitive, contradicting our assumption that 
yx 2 G L(M). It must be the case then that xy 2 is a fc-power for some k > 4. Thus, xy 2 = u k 
for some primitive u uniquely determined by x and y. With each pair of cubes x 3 and y 3 
such that both xy 2 and j/x 2 are in L(M) we may therefore associate a /c-power w fc e L(M) 
of length r, where > 4. We have already established in Step 1 that the number of such 
/c-powers is at most n. It follows that by deleting at most n pairs from the set Sb we obtain 
a set of pairs satisfying the conditions of Theorem [391 We must therefore have \Sb\ < 2n 
and thus \B\ < In. 

Step 3. Finally we consider the set C of squares of length r in L(M). For each such 
square w = x 2 , we define a pair (x,x). Let Sc denote the set of such pairs. Consider two 
pairs in Sc- (x,x) and (y,y)- Suppose that xy and yx are both in L(M). The word xy 
is not a square and must therefore be a fc-power for some k > 3. We write = u k for 
some primitive w uniquely determined by x and y. In Steps 1 and 2 we established that the 
number of fc-powers of length r, k > 3, is |A| + |5| < 3n. It follows that by deleting at most 
3n pairs from the set Sc we obtain a set of pairs satisfying the conditions of Theorem [2S1 
We must therefore have \Sc\ < 4n and thus |C| < 4n. 

Putting everything together, we see that there are \A\ + \B\ + |C| < 7n words of length 
r in L(M), as required. □ 

The bound of 7n in Proposition HO is almost certainly not optimal. 
We now prove the following algorithmic result. 

Theorem 41. Given an NFA M with n states, it is possible to determine if every word in 
L(M) is a power in 0(n 5 ) time. 

Proof. First, we observe that we can test whether a word w of length n is a power in 0(n) 
time, using a linear-time string matching algorithm, such as Knuth-Morris-Pratt [TH] . To do 
so, search for w = a\a 2 ■ • ■ a n in the word x = a 2 - ■ ■ a n a\ ■ ■ ■ a n -\- Then w appears in x iff w 
is a power. Furthermore, if the leftmost occurrence of w in x appears beginning at a^, then 
w is a n/{i — 1) power, and this is the largest exponent of a power that w is. 

Now, using Theorem [3H1 it suffices to test all words in L(M) of length < 3n; every word 
in L(M) is a power iff all of these words are powers. On the other hand, by Proposition 1401 
if all words are powers, then the number of words of each length is bounded by In. Thus, 
it suffices to enumerate the words in L(M) of lengths 1,2,..., 3n, stopping if the number 
of such words in any length exceeds In. If all these words are powers, then every word is a 
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power. Otherwise, if we find a non-power, or if the number of words in any length exceeds 
7n, then not every word is a power. 

By the work of Makinen [22] or Ackerman & Shallit [1J, we can enumerate these words 
in 0(n 5 ) time. □ 

Using part (2) of Theorem [33 along with Proposition HDJ we can prove the following. 

Theorem 42. Given an NFA M with n states, we can decide if all but finitely many words 
in L(M) are non-powers in 0(n 5 ) time. 

Proof. The proof is analogous to that of Theorem UTJ The only difference is that here we 
need only enumerate the words in L(M) of lengths n, n + 1, . . . , 3n. □ 



8 Bounding the length of a smallest power 

In Section [6] we gave an upper bound on the length of a smallest non-/c-power accepted by 
an n state NFA. In this section we study the complementary problem of bounding the length 
of the smallest fc-power accepted by an n-state NFA. 

Proposition 43. Let M be an NFA with n states and let k > 2 be an integer. If L(M) 
contains a k-power, then L(M) contains a k-power of length < kn k . 

Proof. Consider the NFA-e M' accepting L(M) 1//fc defined in the proof of Prop osit ion [181 The 
only transitions from the start state of M' are e-transitions to submachines whose states are 
(2k — l)-tuples of the form [gi, g 2 , . . . , gk-i,Po,Pi, ■ ■ ■ ,Pk-i], where the first (k — l)-elements 
of the tuple are fixed. Thus we may consider L(M') as a finite union of languages, each 
accepted by an NFA of size n k . It follows that if M' accepts a non-empty word w, it accepts 
such a w of length < n k . However, M' accepts w if and only if M accepts w h . We conclude 
that if L(M) contains a fc-power, it contains one of length < kn k . □ 

We now give a lower bound on the size of the smallest /c-power accepted by an n-state 
DFA. 

Proposition 44. Let k > 2 be an integer. There exist infinitely many DFAs M n such that 

(a) M n has O(kn) states; 

(b) The shortest k-power accepted by M n is of length k ■ O (u))- 
Proof. For n > k, let 

L n = (a n )+b(a n - 1 )+b ■ • ■ (a n - fc+1 )+b. 

Then L n is accepted by a DFA with O(kn) states, and the shortest /c-power in L n is (a^b) fe , 
where 

£ = lcm(n, n — 1, . . . , n — k + 1) > n(n — 1) • • - (n — k + l)/k\ — 
as required. □ 
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Next we consider the length of a smallest power (rather than £;-power). 

Proposition 45. Let M be an NFA with n states. If L(M) contains a power, it contains a 
k-power for some k, 2 < k < n + 1. 

Proof. Suppose to the contrary that the smallest k for which L(M) contains a fc-power w k 
satisfies k > n + 1. For some accepting computation of M on w k let g 1; q 2 , . . . , qk-i be the 
states reached by M after reading w, w 2 , . . . , w k ~ l respectively. Since k > n + 1, there exist 

1 and j where l<i<j<k — 1 and q\ = qj. It follows that M accepts w for some £, 

2 < £ < k, contradicting the minimality of k. We conclude that if L(M) contains a fc-power, 
we may take k < n + 1. □ 

Proposition 46. Let M be an NFA with n states. If L(M) contains a power, then L(M) 
contains a power of length < (n + l)n n+1 . 

Proof. Apply Propositions I4"51 and |4"51 □ 

We now give a lower bound. 
Proposition 47. There exist infinitely many DFAs M n such that 

• M n has 0(n) states; 

• The shortest power accepted by M n is of length e n (Vn\ogn) _ 

Proof. Let pi denote the i-th prime number. For any integer n > 2, let P{n) = p^ be the 
largest prime number such that pi + p 2 + h Pk < n. We define 

L n = (a Pl ) + b(a P2 )+b---(a Pfc ) + b. 

Then L n is accepted by a DFA with 0(n) states. 

If k is itself prime, the shortest power in L n is w = (a £ b) fc , where £ = p\p 2 • • - Pk- For 
n > 2, let 

F(n)= l[ p, 

p<P(n) 

where the product is over primes p. We have F(n) e e n( ^ nlogri ) [21 Theorem 1]. This lower 
bound is valid for all sufficiently large n; in particular, it holds for infinitely many n such 
that n = pi + p 2 + ■ ■ ■ + Pk, where k is prime. This gives the desired result. □ 



9 Additional results on powers 

Domosi, Martin- Vide, and Mitrana [H Theorem 10] proved that if L is a slender regular 
language over S, and Qs is the set of primitive words over S, then L n Qs is regular. This 
result is somewhat surprising, since it is widely believed that Q-£ is not even context-free for 
|E| > 2. In this section we apply a variation of their argument to show that may be 
replaced by the language of squares, (cubes, etc.) over S. 

For any integer k > 2 and alphabet E, let P(k, E) denote the set of fc-powers over E. 
Clearly, for |E| > 2, P(k, E) is not context-free. 
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Proposition 48. If L C E* is a slender regular language, then for all integers k > 2, 
L fl P(k, E) regular. 

Proof. If L is slender, then by Theorem [TT1 it suffices to consider L = uv*w. The result is 
clearly true if v is empty, so we suppose v is non-empty. Let x and y be the primitive roots of 
v and wu respectively If x — y, then the set of fc-powers in v*wu is given by v*wun (x k )*, so 
the set of /c-powers in uv*w is regular. If x ^ y, then by Theorem 1291 the set v*wu contains 
only finitely many /c-powers. The set of fc-powers in uv*w is therefore finite, and, a fortiori, 
regular. □ 



10 Testing if an NFA accepts a bordered word 

In this section we give an efficient algorithm to test if an NFA accepts a bordered word. We 
also give upper and lower bounds on the length of a shortest bordered word accepted by an 
NFA. 

Proposition 49. Given an NFA M with n states and t transitions, we can decide if M 
accepts at least one bordered word in 0(n 3 t 2 ) time. 

Proof. Given an NFA M = (Q, E, 5, qo, F), we can easily create an NFA-e M' that accepts 

{u G E* : there exists w6E* such that uwu G L} 

by "guessing" the state we would be in after reading uw, and then verifying it. More formally, 
we let M' = (£', E, 5', q' , F') where Q' = {q' } U {[p, q, r] : p,q,r G Q}, F' = {\p,q,r] : r G 
F and there exists tceE* such that q G 5(p,w)}. The transitions are defined as follows: 
8(q' ,e) = {[<to,P,p] ■ P e Q} and 

S(\p,q,r],a) = {\p',q,r'] : p G 5{p, a), r' G <J(r, a)}. 

If M has n states and t transitions, then M' has n 3 + l states and at most n+n 3 t 2 transitions. 
Now get rid of all useless states and their associated transitions. We can compute the final 
states by doing n depth-first searches, starting at each node, at a cost of 0(n(n + 1)) time. 
Now we just test to see if L(M') accepts a nonempty string, which can be done in linear 
time in the size of M' . □ 

Corollary 50. If M is an NFA with n states, and it accepts at least one bordered word, it 
must accept a bordered word of length < 2n 2 + n. 

Proof. Consider the DFA M' constructed in the proof of the previous theorem, which accepts 

I! = {u G E* : there exists w G E* such that uwu G L}. 

If M accepts a bordered string, then M' accepts a nonempty string. Although M' has n 3 + 1 
states, once a computation leaves q' and enters a triple of the form [p, q, r], it never enters 
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a state \p', q',r'] with q ^ q'. Thus we may view the NFA M' as implicitly defining a union 
of n disjoint languages, each accepted by an NFA with n 2 states. Therefore, if M' accepts 
a nonempty string u, it accepts one of length at most n 2 . Now the corresponding bordered 
string is uwu. The string w is implicitly defined in the previous proof as a path from a 
state p to a state q. If such a path exists, it is of length at most n — 1. Thus there exists 
uwu G L(M) with \uwu\ < 2n 2 + n — 1. □ 

Proposition 51. For infinitely many n there is an DFA of n states such that the shortest 
bordered word accepted is of length n 2 /2 — 6n + 43/2. 

Proof. Consider a(6*) + ca(6 i_1 ) + c. An obvious DFA can accept this using 2t + 5 states. 
However, the shortest bordered word accepted is afr'C -1 ) ca6*^ -1 ^ c, which is of length 2t(£ — 
1) +4 = n 2 /2 - 6n + 43/2. □ 

We now consider testing if an NFA accepts infinitely many bordered words. 

Corollary 52. If an NFA M has n states and t transitions, we can test whether M accepts 
infinitely many bordered words in 0(n 6 t 2 ) time. 

Proof. If an NFA M accepts infinitely many words of the form uwu, there are two possibil- 
ities, at least one of which must hold: 

(a) there is a single word u such that there are infinitely many w with uwu G L(M), or 

(b) there are infinitely many u, with possibly different w depending on u, such that uwu G 
L(M). 

To check these possibilities, we return to the NFA-e M' constructed in the proof of 
Theorem H9l First, for each pair of states to qj, we determine whether there exists a 
nonempty path from ^ to qj. This can be done with n different depth-first searches, starting 
at each vertex, at a cost of 0(n 3 (n 3 + t 2 )) time. In particular, for each vertex, we learn 
whether there is a nonempty cycle beginning and ending at that vertex. 

Now let us check whether (a) holds. After removing all useless states and their associated 
transitions, look at the remaining final states [p, q, r] of M' and determine if there is a path 
from p to q that goes through a vertex with a cycle. This can be done by testing, for each 
vertex s that has a cycle, whether there is a non-empty path from p to s and then s to q. If 
such a vertex exists, then there are infinitely many w in some uwu. 

To check whether (b) holds, we just need to know whether M' accepts infinitely many 
strings, which we can easily check by looking for a directed cycle. 

The total cost is therefore 0(n 3 (n 3 t 2 )). □ 

We now prove the following decomposition theorem for regular languages consisting only 
of bordered words. 

Theorem 53. If every word in a regular language L is bordered, then there is a decomposition 
of L as a finite union of regular languages of the form JKJ, where each J and K are regular 
and e ^ J. 
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Proof. Let L be accepted by an NFA M = (Q, E, 5, go, F). For each x G E + , define an 
automaton M x = (Q, £, 5, 1', F') (for M x we permit multiple initial states), where the set of 
initial states is V = 5(q ,x), and the set of final states is F' = {q G Q : 8(q, x) G F}. Then 
M x has the property that for every w G L(M X ), we have a;u>x G L(M). Note that there are 
only finitely many distinct automata M x . 

For each automaton M x , define the regular language 

L x = {y: 5(q, y) = V and {q G Q : %, y) G F} = F'}. 

Note that again there are only finitely many distinct languages L x . 

For every x G £ + , every word in L X L(M X )L X is in L. Furthermore, if w G L is bordered, 
then there exists x G E + such that w G L X L(M X )L X . Thus, if every word of L is bordered, 
then L = U x£ y:+L x L(M x )L x . Since there are only finitely many languages L x and L(M X ), 
this union is finite, as required. □ 

11 Testing if an NFA accepts an unbordered word 

We present a simple test to determine if all words in a regular language are bordered, and 
to determine if a regular language contains infinitely many unbordered words. We first need 
the following well-known result about words, which is due to Lyndon and Schiitzenberger 

Lemma 54. Suppose x, y and z are non-empty words, and that xy = yz. Then there is a 
non-empty word p, a word q and a non-negative integer k\ for which we can write x = pq, 
z = qp, and y = (pq) hl p. 

We also need the following result, which is just a variation of the pumping lemma. 

Lemma 55. Let M = (Q,H,5,qo, F) be an n-state NFA. Let L be the language accepted by 
M. Let d be a positive integer. Let (X, y, Z) be a 3-tuple of words for which \y\ is a multiple 
of d, \y\ > nd and XyZ G L. Then there are words r, s and t, whose lengths are multiples 
of d, with \s\ > d, for which we can write y = rst, and, for all z > 0, Xrs z tY G L. 

Proof. Set / := \X\ and m := \y\/d, 7 := XyZ, and k := |7|. First, write 7 as a sequence 
of letters, that is, 7 := 7172 ■ ■ - jk with each ji a letter. By for 1 < i, j < \ j\ we mean 

the subsequence that consists of the % — j + 1 consecutive letters of 7 starting at position i 
and ending at position j, that is, 7«7i+i • • -jj- If i > j we take to be the empty word. 

Now we have the following sequence of k states 

<?i e 5(<? ,7i),<?2 e 5(91,72), • • • ,Qk e 7*)- 

We'll choose qu to be a final state. 

Note that y = j[l + 1, 1 + md], and consider the following sequence of m + 1 states of M: 

111 Ql+d, Ql+2d, • • • , Ql+md- 
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There are integers i and j, with < i < j < m for which qi + id = qi+jd- Set r : = 
j[l + 1,1 + id], s := j[l + id +1,1 + jd], and t := j[l + jd + 1,1 + md], so y = rst. Note that 
\s\ > d, and the desired conclusion follows immediately. □ 

Lemma 56. Let M be an n-state NFA. Let L be the language accepted by M . Let (X, Y, Z) 
be a 3-tuple of words for which XYZ G L. Then there is a word y for which \y\ < n and 
XyZ e L. 

Proof. Let S := {u G X* : XuZ G L}. Let y be an element of S of minimal length. We 
proceed by contradiction, and suppose \y\ > n. We apply Lemma [55] to (X,y,Z), with 
d — 1, and write y = ?"s£ with s non-empty. Then XrtZ G L, which violates the minimality 
oi\y\. □ 

Lemma 57. Suppose there are words ^/l, ^r, e ; /, g and /i wif/i |^| = \^r\, |e| < l^il, 
Igl < and for which 

b ( := = (1) 

and 

b v := = h* R . (2) 

Suppose further that \b v \ < \b^\. Then we can write = h{pq) k p and ^/r = (pq) k pg for p a 
non-empty word, q a word for which \g\ + \pq\ = \ f\, and k a positive integer. 

Proof. Since \b v \ < \b^\, we must have \g\ < \e\ < \^r\- This last observation, together 
with ([1]) and ([2]) above allows us to assert that there are non-empty words s± and S2, with 
|s 2 | > such that ^/r = s±e = S2g. This last fact combined again with ([1]) and ([2]) yields 
that 

and 



= fsi = hs 2 , (3) 



m R = Sl e = s 2 g. (4) 

Now we can apply ([3]) and (jlj) to assert that there are non-empty words r\ and r 2 for 
which sir i = s 2 = r 2 s\, that is, 

s l r 1 = r 2 s x . (5) 

Now apply Lemma [Ml to ([5]) to get that there is a non-empty word p, a word q and 
an integer k\ > for which s± = {pq) kl p, r\ = qp, and r 2 = pq. Set k := k\ + 1. Then 
s 2 = {pq) k p, and ([3]) gives ^/l = h{pq) k p, and (rjj gives ^ r = {pq) k pg. Also s 2 = r 2 s\ 
combined with ([3]) above gives that / = hr 2 , so \g\ + \pq\ = \h\ + \pq\ = \h\ + \r 2 \ = \f\. □ 

Theorems [58] and [67] below are the main results. 

Theorem 58. Let M be an n-state NFA. Let L be the language accepted by M. Let N be a 
non-negative integer. Suppose all words in L of length in the interval [N, 2N + 6n + 1] are 
bordered. Then all words in L of length greater than 2N + Qn + 1 are bordered. Hence, if all 
words in L of length at most 6n + 1 are bordered, then all the words in L must be bordered. 
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Proof. We'll prove Theorem [58] by making the following series of observations. Throughout, 
we'll assume that all words in L of length in the interval [N, 2N + 6n + 1] are bordered, 
and we'll assume w is an unbordered word in L for which |iw| > 2N + 6n + 1, with |iu| 
minimal. We write w as udv with 9 a word for which \9\ < 1 and u and v words for which 
\u\ = \v\ > 3n + N. 

Claim 59. Write u as ^lXl and v as Xr§> r, for words ^ l, Xl, ^r, Xr for which \Xl\ = 
\X R \ = n. (So that w is ^ lX l 6Xr^ r .) Then there are words xl and x R , both of length less 
than n, for which: 

(i) ( ■= Vlx l 9X r V r G L, and 

(ii) r] := V L X L 9x R V R E L. 

Further, N < |£| < \w\, and N < \rj\ < \w\. 

To justify (i), apply Lemma EH to the 3-tuple (^/ L ,X L ,9X R ^/ R ). Similarly, to arrive at 
(ii), apply Lemma [561 again to the 3-tuple (^lX l 9,X r ,^/ r ). 

Claim 60. We can write ^> i = h(pq) k p and ^ R = {pq) k pg for p a non-empty word, g, h 
and q words for which \g\ = \h\, \pq\ + \g\ < n, and k a positive integer. Hence w can be 
written as h{pq) k pX lOX n{pq) k pg ■ 

To justify ClaimEQl first recall w = ^ L X L 9X R ^ R and |^ L | = \V R \ > 2n. From Claim[59] 
above we get that ( and rj are bordered words, so we can assert that there exist non-empty 
words and b Vl and words p^ and p v , for which: 

(I) C = y L x L 0X R y R = b cPc b c , and 
(II) r] = y L X L 6x R ^ R = b riPri b ri . 

Note that, if \b^\ < |^| then by (I) b^ would be a border for w. So we must have 
|&cl > Similarly, (II) gives that \b v \ > These latter facts together with (I) and 

(II) give that there exists non-empty words e, /, g, h, for which |e| = |/|, \g\ = \h\, and for 
which 

k = = fS> R , (6) 

and 

b v = ^ L g = h^ R . (7) 

Further, \(\ < \w\ implies that |/| < n, and similarly \rj\ < \w\ implies that \h\ < n. 

Suppose | b v | = \b^\. Then from ([HD and ([7]) above, |e| = \g\. But e and g are suffixes of 
ty R , so we get that e = g. Hence b^ = = = b v . Set b := b^ = h n . Then from (II) 
above, as |6| < + n, b is a prefix of ^lXl- And from (I) above, b is a suffix of X R ^/ R . 
So b is a non-empty prefix of w, and a suffix of w. Hence, as \b\ < ^y-, b is a border for w. 

So we must have \b v \ ^ \b^\. Suppose first that \b ri \ < \b^\. Now apply Lemma loTl to get 
that there is a positive integer k, a non-empty word p and a word q for which ^ l = h(pq) h p 
and *5/ R = (pq) k pg. And finally observe that \pq\ + \g\ = \ f\ < n. If \b v \ > \b^\, the argument 
is similar, so Claim [60] is established. 
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Claim 61. Let x := pq in the statement of Claim [ffOl There is a conjugate cl of x which is 
a prefix of^i, and there is a conjugate cr of x which is a suffix of^R. 

To justify ClaimEU let Sl be the prefix of length nof^i- So there is a word Tl for which 
we can write ^ lXlOXr = SlTl- (So w is SlTl^r.) Now apply Lemma [561 to (Sl, Tl, ^r), 
obtaining a word tz, with < n for which ui\ := Si^l^r £ L. By supposition, since 
N < \ui\\ < \w\, wi has a border, say b\. Further, if \b\\ < n then b\ would be a border for 
w. So we must have \bi\ > n. And |&i| < implies |&i| < \$r\- 

So b\ is a suffix of ^ r of length greater than n; hence by Claim EDI above we can write 
b\ = s x x k2 pg for some integer k 2 > 0, with a suffix of x. Write x = p x s x , and recall that 
p is a prefix of x. Then |s x x fc2 p(7| > n and |x| + |gf| < n (from Claim [60]) yields that s x p x is a 
prefix of s x x k2 pg, that is, s x p x is a prefix of b\. So set := s x p x . Since & x is a prefix of wi, 
cl must be a prefix of u>i, and |cl| < n = gives that cl is a prefix of Sl, and the first 
statement of Claim I6T1 follows. 

To get the second statement of Claim [611 similarly let Sr be the suffix of length n of 
^/r. So there is a word T R for which we can write X l 9Xr^r = TrSr. (So w is ^lTrSr.) 
Now apply Lemma [561 to (^ l ,Tr, Sr), obtaining a word t R , with < n for which w 2 '■— 
^^hSr € L. By supposition, since N < \w 2 \ < \w\, w 2 has a border, say b 2 . Further, if 
| b 2 1 < then 6 2 would be a border for w. So we can assert that n < \b 2 \ < \^l\- 

So b 2 is a prefix of \&£ of length greater than n; hence by Claim [BUI we can write b 2 = hx k3 p x 
for some integer k% > 0, with p x a prefix of x. Write x = p x o~ x . Then \hx k3 p x \ > n and 
|x| + \h\ < n (from Claim [601) yields that cr^px is a suffix of hx k3 p x , that is, <r x p x . is a suffix 
of b 2 . So set cr := cr x pa;. Since b 2 is a suffix of w 2 , cr_ must be a suffix of w 2 , and also 
\cr\ < n = I I yields that cr is a suffix of Sr, and the second statement of Claim I6T1 
follows. 

To complete the proof of Theorem l58j note that, since cl and cr are both conjugates of 
x, cl and cr are non-empty words which are conjugates. So there is a non-empty word a 
and a word f3 for which we can write cl = cx(3 and cr = Pa. Then a is a prefix of ^l, and 
a is a suffix of ^/r, which gives that a is a border for w, and gives a contradiction. □ 

Corollary 62. The problem of determining if an NFA accepts an unbordered word is decid- 
able. 

Proof. Let M be an NFA with n states. To determine if M accepts an unbordered word, it 
suffices to test whether M accepts an unbordered word of length at most 6n + 1. □ 

We do not know if there is a polynomial-time algorithm to test if an NFA accepts an 
unbordered word or if the problem is computationally intractable. 

Theorem EHI gives an upper bound of 6n + 1 on the length of a shortest unbordered word 
accepted by an n-state NFA. The best lower bound we are able to come up with is 2n — 3, as 
illustrated by the following example: an NFA of n states accepts ab n ~ 3 ab*, and the shortest 
unbordered word accepted is ab n ~ 3 ab n ~ 2 , which is of length 2n — 3. 

Theorem 63. Let M be an n-state NFA, and let L be the language accepted by M. Suppose 
there is an unbordered word in L of length greater than 4n 2 + 6n + 1. Then L contains 
infinitely many unbordered words. 
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Proof. Suppose L contains only finitely many unbordered words. Let w be an unbordered 
word in L of length greater than 4n 2 + 6n + 1, with |iu| maximal. Write w as ^lXlOXr^r 
for words X L , 9, ^ R , X R for which \X L \ = \X R \ = n, \ty L \ = \^ R \ > 2n 2 + 2n, and 
|#| < 1- We proceed by making the following series of observations. 

Claim 64. There are words xl, ul, Ul and x R , u R , y R , with ul and u R both non-empty, 
X L = x L u L y L , X R = x R u R y R , and for which: 

(i) C := * LXLULULyiOX^ R e L, and 

(li) n := ^ L X L 9x R u R u R y R ^ R G L. 

Further, \(\ > \w\, and > \w\. 

To justify (i), apply Lemma 1531 (with d — 1) to the 3-tuple (^ L ,X L ,9X R ^ R ). Similarly, 
to arrive at (ii), apply Lemma 151)1 again (also with d — 1) to the 3-tuple (^lXl6,X r , ^ r ). 

Claim 65. We can write = h{pq) k p and ty R = {pq) k pg forp a non-empty word, g, h and 
q words for which \g\ = \h\, \pq\ + \g\ < In, and k an integer > n. Hence w can be written 
as h{jpq) k pXi,QX R {j>q) k pg . 

To justify Claim [65], first recall that w = \1/ l^l^lUl ^^r^rVr^ r-, and X L = x^n^y^, 
Xr = x R u R y R . From Claim EH above and the maximality of \w\ we get that ( and rj are 
bordered words, so we can assert that there exist non-empty words and b^, and words p^ 
and p v , for which: 

(I) C = ^LXLU L ULyL0X R ty R = 6 c p c 6 ( , and 

(II) 7] = ^ L X L 9x R u R u R y R ^ R = b^prprj. 

Note that, if \b^\ < then by (I) b^ would be a border for w. So we must have 
\b(\ > Similarly, (II) gives that > These latter facts together with (I) and 

(II) give that there exists non-empty words e, /, g, h, for which |e| = |/|, \g\ = \h\, and for 
which 

b ( = V L e = f* R , (8) 

and 

b v = ^ L g = h^ R . (9) 

Further, the reader can verify that |e| < 2n < \^r\, and \g\ < 2n < \^r\- 
Suppose \brjl = \b^\. Then from ([8]) and (Q above, |e| = \g\. But e and g are suffixes 
of *5/ R , so we get that e = g. Hence b^ = ^/ L e = ^^g = b v . Set b := b^ = b v . Now 
\ulUlGX r \ > \xlUl\, so from (I) above, we must have |6| < \uLyL0X R ^ R \, that is, b is 
a suffix of u i,y l6 X R ^! R . Similarly, \Xl6x r u r \ > \u R y R \, so from (II) above we get that 
\b\ < |\& lXl9x r ur\, that is, b is a prefix of ^ lXlOxrUr. So b is a non-empty prefix of w, 
and a suffix of w. Hence w must be bordered, which is a contradiction. 

So we must have \b v \ ^ \b^\. First, suppose \b v \ < Now apply Lemma I5T1 to get that 
there is a positive integer k, a non-empty word p and a word q for which \l/ l — h(pq) k p and 
r = (pq) k pg. And finally observe that \pq\ + \g\ = \ f\ < 2n, and since > 2n 2 + 2n and 
\pq\ < 2n, we get that k > n. The case > \b^\ is symmetric, so Claim [651 is established. 



33 



Claim 66. Let x := pq in the statement of Claim \6h\ There is a conjugate cl of x which is 
a prefix of^i, and there is a conjugate cr of x which is a suffix of^R. 

To justify Claim [HS recall from Claim that w is ^ ^X^QX^x^pg. And since k > n, 
we can apply Lemma [551 to the 3-tuple of words (^iXL9XR,x k ,pg), with d := \x\, obtaining 
a positive integer J\ for which, for all z > 0, we have ^ lXlOXrx^ lZ pg G L. So choose 
Z\ := I^lXlOXrI, and define ui\ := \I> LXi9XRX k+JlZl pg. By supposition w\ is a bordered 
word, say with border b\. Further, if \b\\ < \^ r\ then bi would be a border for w. So we 
must have |&i| > \^r\- And |&i| < ^ implies < \x k+JlZl pg\. 

So bi is a suffix of x k+Jlzl pg of length greater than |^^| > 2n, hence by Claim [651 above 
we can write bi = s x x k2 pg for some integer k 2 > 0, with s x a suffix of x. Write x = p x s x , 
and recall that p is a prefix of x. Then \s x x 2 pg\ > In and \x\ + |g| < 2n (from Claim [651) 
yields that s x p x is a prefix of s x x k2 pg, that is, s x p x is a prefix of &i. So set cl '■= s x p x . Since 
b\ is a prefix of w\, cl must be a prefix of wi, and |cl| < 2n gives that cl is a prefix of ^l, 
and the first statement of Claim [661 follows. 

To justify the second statement of Claim [661 we proceed similarly; that is, we recall 
that w is hx k pX L 9X R ^ R , and apply Lemma [551 to the 3-tuple of words (h,x k ,pX L 9XR^ R ), 
with d := \x\, allowing us to assert that there is a positive integer J 2 for which, for all 
z > 0, we have hx k+J ' 2Z pX L 9XR^ R G L. So choose z 2 := and define w 2 : = 

hx k+j2Z2 pXL9XR^ r. By supposition w 2 is a bordered word, say with border 6 2 - Further, if 
I&2I < |^l| then b 2 would be a border for w. So we must have |6 2 | > And |6 2 | < ^yi- 

implies \b 2 \ < |/ix fe+j22;2 p|. 

So 6 2 is a prefix of hx k+j2Z2 p of length greater than > 2n; hence by Claim [65] we 
can write b 2 = hx k3 p x for some integer k% > 0, with p x a prefix of x. Write x = p x o~ x . Then 
\hx p x \ > 2n and \x\ + < In (from Claim 1651) yields that o~ x p x is a suffix of hx k3 p x , that 
is, a x p x is a suffix of b 2 . So set cr := a x p x . Since 6 2 is a suffix of w 2 , cr must be a suffix of 
ty 2 , and also \cr\ < 2n yields that cr is a suffix of ^/r, and the second statement of Claim [661 
follows. 

To complete the proof of Theorem [631 note that, since cl and cr are both conjugates of 
x, cl and cr are non-empty words which are conjugates. So there is a non-empty word a 
and a word (5 for which we can write cl = ctfi and cr = f3a. Then a is a prefix of ^>l, and 
a is a suffix of ^/r, which gives that a is a border for w, which is a contradiction. So we're 
forced to conclude that L contains infinitely many unbordered words. □ 

Theorem 67. Let M be an n-state NFA, and let L be the language accepted by M. Then 
the following are equivalent: 

1. L contains infinitely many unbordered words. 

2. There is an unbordered word w in L, with 4n 2 + 6n + 2 < \w\ < 8n 2 + 18n + 5. 

Proof. (1) — > (2). Suppose all words w G L whose lengths are in [An 2 + Qn + 2, 8n 2 + 18n + 5] 
are bordered words. Then by Theorem (with iV = 4n 2 + 6n + 2), we have that any word 
in L whose length is at least 4n 2 + 6n + 2 is bordered, i.e., L contains at most finitely many 
unbordered words. 
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(2) — > (1). This follows immediately from Theorem [631 



□ 



Corollary 68. The problem of determining if an NFA accepts infinitely many unbordered 
words is decidable. 

Proof. Let M be an NFA with n states. To determine if M accepts infinitely many unbor- 
dered words, it suffices to test whether M accepts an unbordered word w, where 4n 2 +6n+2 < 
\w\ < 8n 2 + 18n + 5. □ 

We do not know if there is a polynomial-time algorithm to test if an NFA accepts infinitely 
many unbordered words or if the problem is computationally intractable. 

12 Final remarks 

In this paper we examined the complexity of checking various properties of regular languages, 
such as consisting only of palindromes, containing at least one palindrome, consisting only 
of powers, or containing at least one power. In each case (except for the unbordered words), 
we were able to provide an efficient algorithm or show that the problem is likely to be hard. 
Our results are summarized in the following table. Here M is an NFA with n states and t 
transitions. When L is the language of unbordered words, it is an open problem to either 
find polynomial time algorithms to test if (a) L(M) D L — 0, and (b) L(M) H L is infinite, 
or to show the intractability of these problems. 
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